Spectral clustering with eigenvector selection based on entropy ranking
نویسندگان
چکیده
Ng–Jordan–Weiss (NJW) method is one of the most widely used spectral clustering algorithms. For a K clustering problem, this method partitions data using the largest K eigenvectors of the normalized affinity matrix derived from the dataset. It has been demonstrated that the spectral relaxation solution of K-way grouping is located on the subspace of the largest K eigenvectors. However, we find from a lot of experiments that the top K eigenvectors cannot always detect the structure of the data for real pattern recognition problems. So it is necessary to select eigenvectors for spectral clustering. We propose an eigenvector selection method based on entropy ranking for spectral clustering (ESBER). In this method, first all the eigenvectors are ranked according to their importance on clustering, and then a suitable eigenvector combination is obtained from the ranking list. In this paper, we propose two strategies to select eigenvectors in the ranking list of eigenvectors. One is directly adopting the first K eigenvectors in the ranking list. Different to the largest K eigenvectors of NJW method, these K eigenvectors are the most important eigenvectors among all the eigenvectors. The other eigenvector selection strategy is to search a suitable eigenvector combination among the first Km (Km>K) eigenvectors in the ranking list. The eigenvector combination obtained by this strategy can reflect the structure of the original data and lead to a satisfying spectral clustering result. Furthermore, we also present computational complexity reduction strategies for ESBER method to deal with large-scale datasets. We have performed experiments on UCI benchmark datasets, MNIST handwritten digits datasets, and Brodatz texture datasets, adopting NJW method for a baseline comparison. The experimental results show that ESBER method is more robust than NJW method. Especially, ESBER method with the latter eigenvector selection strategy can obtain satisfying clustering results in most
منابع مشابه
A Framework for Optimal Attribute Evaluation and Selection in Hesitant Fuzzy Environment Based on Enhanced Ordered Weighted Entropy Approach for Medical Dataset
Background: In this paper, a generic hesitant fuzzy set (HFS) model for clustering various ECG beats according to weights of attributes is proposed. A comprehensive review of the electrocardiogram signal classification and segmentation methodologies indicates that algorithms which are able to effectively handle the nonstationary and uncertainty of the signals should be used for ECG analysis. Ex...
متن کاملSpectral clustering with eigenvector selection
The task of discovering natural groupings of input patterns, or clustering, is an important aspect machine learning and pattern analysis. In this paper, we study the widely-used spectral clustering algorithm which clusters data using eigenvectors of a similarity/affinity matrix derived from a data set. In particular, we aim to solve two critical issues in spectral clustering: (1) How to automat...
متن کاملA Hybrid Grey based Two Steps Clustering and Firefly Algorithm for Portfolio Selection
Considering the concept of clustering, the main idea of the present study is based on the fact that all stocks for choosing and ranking will not be necessarily in one cluster. Taking the mentioned point into account, this study aims at offering a new methodology for making decisions concerning the formation of a portfolio of stocks in the stock market. To meet this end, Multiple-Criteria Decisi...
متن کاملSpectral 3D mesh segmentation with a novel single segmentation field
We present an automatic mesh segmentation framework, which achieves 3D segmentation in two stages, comprising hierarchical spectral analysis and isolinebased boundary detection. During hierarchical spectral analysis, a novel single segmentation field is defined to capture concavity-aware decompositions of eigenvectors from a concavity-aware Laplacian. Specifically, on the eigenvector hierarchy,...
متن کاملEntropy-based Consensus for Distributed Data Clustering
The increasingly larger scale of available data and the more restrictive concerns on their privacy are some of the challenging aspects of data mining today. In this paper, Entropy-based Consensus on Cluster Centers (EC3) is introduced for clustering in distributed systems with a consideration for confidentiality of data; i.e. it is the negotiations among local cluster centers that are used in t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Neurocomputing
دوره 73 شماره
صفحات -
تاریخ انتشار 2010